Distributed Multivariate Regression Using Wavelet-Based Collective Data Mining

نویسندگان

  • Daryl E. Hershberger
  • Hillol Kargupta
چکیده

This paper presents a method for distributed multivariate regression using waveletbased Collective Data Mining (CDM). The method seamlessly blends machine learning and the theory of communication with the statistical methods employed in parametric multivariate regression to provide an effective data mining technique for use in a distributed data and computation environment. The technique is applied to two benchmark data sets, producing results that are consistent with those obtained by applying standard parametric regression techniques to centralized data sets. Evaluation of the method in terms of model accuracy as a function of appropriateness of the selected wavelet function, relative number of non-linear cross-terms, and sample size demonstrates that accurate parametric multivariate regression models can be generated from distributed, heterogeneous, data sets with minimal data communication overhead compared to that required to aggregate a distributed data set. Application of this method to Linear Discriminant Analysis, which is related to parametric multivariate regression, produced classification results on the Iris data set that are comparable to those obtained with centralized data analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Collective Data Mining: A New Perspective Toward Distributed Data Mining

This paper introduces the collective data mining (CDM), a new approach toward distributed data mining (DDM) from heterogeneous sites. It points out that naive approaches to distributed data analysis in a heterogeneous environment may face ambiguous situation and may lead to incorrect global data model. It also observes that any function can be expressed in a distributed fashion using a set of a...

متن کامل

A Scalable Local Algorithm for Distributed Multivariate Regression

This paper offers a local distributed algorithm for multivariate regression in large peer-to-peer environments. The algorithm can be used for distributed inferencing, data compaction, data modeling and classification tasks in many emerging peer-to-peer applications for bioinformatics, astronomy, social networking, sensor networks and web mining. Computing a global regression model from data ava...

متن کامل

Application of non-linear regression and soft computing techniques for modeling process of pollutant adsorption from industrial wastewaters

The process of pollutant adsorption from industrial wastewaters is a multivariate problem. This process is affected by many factors including the contact time (T), pH, adsorbent weight (m), and solution concentration (ppm). The main target of this work is to model and evaluate the process of pollutant adsorption from industrial wastewaters using the non-linear multivariate regression and intell...

متن کامل

Estimation of Reference Evapotranspiration Using Artificial Neural Network Models and the Hybrid Wavelet Neural Network

Estimation of evapotranspiration is essential for planning, designing and managing irrigation and drainage schemes, as well as water resources management. In this research, artificial neural networks, neural network wavelet model, multivariate regression and Hargreaves' empirical method were used to estimate reference evapotranspiration in order to determine the best model in terms of efficienc...

متن کامل

Clustered Collaborative Filtering Approach for Distributed Data Mining on Electronic Health Records

Distributed Data Mining (DDM) has become one of the promising areas of Data Mining. DDM techniques include classifier approach and agent-approach. Classifier approach plays a vital role in mining distributed data, having homogeneous and heterogeneous approaches depend on data sites. Homogeneous classifier approach involves ensemble learning, distributed association rule mining, meta-learning an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Parallel Distrib. Comput.

دوره 61  شماره 

صفحات  -

تاریخ انتشار 2001